-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Char.isLower and Char.isUpper Unicode-aware #970
base: master
Are you sure you want to change the base?
Conversation
This allows people with non-ASCII alphabets work with `Char.isLower` and `Char.isUpper`. Uses `toUpper` and `toLower` underneath, which use Javascript's `String.prototype.toLower/UpperCase()`. The second condition in the functions is there to distinguish between characters that have an upper/lower-case pairing, and those that don't (`'0' == Char.toLower '0'` but we don't want `isLower '0'` to be true).
Is related to #385. |
What's considered an uppercase character depends on your locale. This PR is still a major improvement. Related to #942. |
For future reference, the
So it seems that I do not want us to theorize about these things here. The next step is to find nice links that describe:
I would prefer to understand the problem more completely before changing things. |
From my cursory googling and research: 1. How is "upper case" defined?I think this FAQ is the link you want. In short, yes, there is a big table. Three, in fact.
Here is the relevant section of standard. It has some sense of inter-version stability between the Unicode versions. 2. How is "locale" defined?Again, an Unicode FAQ; and this time there's a whole homepage. You can download the current version, there are a lot of XML files inside with various data (casing of dates / languages / ..., etc.), to be interpreted according to LDML. They are also transformed from the XML into JSON, which might be a better fit for Elm? |
We might try to be extra-pure and host the big table, in Elm format, ourselves, but that would make elm/core very big, I imagine. The browser already has that cached in the form of The I mean, even main : Html msg
main =
"2018-05-10"
|> Date.fromString
|> toString
|> Html.text shows |
I ran into the same issue when doing the exercise in the forms section, namely checking the uppercase password. At first glance, I thought this was a serious omission. However, then I wondered if it was worth letting users set their passwords to Unicode. However, in any case, this is not decided at the stage of front-end approval, but much earlier. Thus, this limitation may cause frustration for developers from regions other than English. And negatively affects the use of Elm as the main front-end stack in the Enterprise environment. This means about the popularity and development of the language. But I believe that such an annoying flaw will still not be a problem for Elm. Meanwhile. |
FYI, this package can currently be used to deal with unicode strings: https://package.elm-lang.org/packages/BrianHicks/elm-string-graphemes/latest/ |
Lines 84 to 85 in e47edeb
Wouldn't
Same applies to (I tried using code comments, didn't work, idk why) Edit: The one scenario where this might make a difference is if there's a "middle case" character that has both an upper and a lower case variant. But I don't think such a character exists, and even if it does, should |
This does not fully solve the problem of detecting case in Unicode, as it can also vary by locale. This does make the isUpper/Lower and toUpper/Lower functions consistent. Make Char.isLower and Char.isUpper Unicode-aware This allows people with non-ASCII alphabets work with `Char.isLower` and `Char.isUpper`. Uses `toUpper` and `toLower` underneath, which use Javascript's `String.prototype.toLower/UpperCase()`. The second condition in the functions is there to distinguish between characters that have an upper/lower-case pairing, and those that don't (`'0' == Char.toLower '0'` but we don't want `isLower '0'` to be true).
|
@miniBill, good to know. So, would you argue that |
https://ellie-app.com/gBmgVVFzhbRa1 this contains a table of all the 1441 codepoints that give wrong results with the current proposal I personally think the proposal is the best compromise between accuracy and size/speed. Getting better results belongs in external packages (like elm-unicode) |
This PR causes a compile error when it is used: elm-janitor/apply-patches#1 |
@rupertlssmith I can't edit this PR's code anymore, see #1138 for compilable code. |
This allows people with non-ASCII alphabets work with
Char.isLower
andChar.isUpper
. UsestoUpper
andtoLower
underneath, which use Javascript'sString.prototype.toLower/UpperCase()
.The second condition in the functions is there to distinguish between characters that have an upper/lower-case pairing, and those that don't (
'0' == Char.toLower '0'
but we don't wantisLower '0'
to be true).EDIT: I can't update code of this PR anymore; there is #1138 with a fix for the
==
and/=
import.